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Curriculum and Translation Differential Item Fimctioning: 

A Comparison of Two DIF Detection Techniques 
Introduction 

It is usual to worry about translation-related differential item functioning (DEF) when tests are 
given in different languages. Indeed, previous studies have stressed the importance of identifying sources 
of translation DEF (Allalouf, 2000; Allalouf, Hambleton, & Sireci, 1999; Gierl, & Khaliq, 2001; Gierl, 
Rogers, & Klinger, 1999). However, translation problems may not be the right explanation for all items 
that exhibit DEF. 

Many studies (e.g., Huang, 1998; Mehrens & Phillips, 1986; Miller & Linn, 1988) suggest that 
the degree of match between an assessment and the taught curriculum can have a large impact on 
achievement test scores. Huang (1998), for instance, showed that controlling the error variance due to 
curriculum sampling decreased slightly the rate of item classification inconsistency, and asserted that the 
"finding suggested that different school curriculum may play a role in the differences foimd in student 
performance" (p.l3). Porter, Schmidt, Floden and Freeman (1978) assert that state or provincial decision 
makers need to be keenly aware of the relationship between tests and curricula, in addition to translation 
issues, if they are to make sense of test results from schools using different languages. This is especially 
so where instruction occurs in different languages and curriculum-aligned materials are differentially 
available across language groups. Although some items may be translated very accurately, they may still 
behave differently because of differences in how the curriculum is defined and/or taught in the different 
language groups. The School Achievement Indicators Program (SAIP) mathematics test results from 
Ontario provide a unique opportunity to study translation and curriculum DEF, because we know 
something about differences in curricula between French- and English-language schools. 

There does not seem to be agreement among psychometric professionals on the statistical 
procedures best suited for DEF detection. Psychometric literature is rich with studies that use one 
approach or another or, in some cases, more than one approach to investigate DEF (Narayan & 
Swaminathan, 1996; Prieto, Barbero, & San Luis, 1997; Raju, Drasgow, & Slinde, 1993; Shepard, 
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Camilli, & Averill, 1981; Whitmore & Schumacker, 1999). The results sometimes point in different 
directions. However, Prieto, Barbero, and San Luis, (1997) observed that the most promising approaches 
appear to be the Mantel-Haenszel (M-H) procedure and those based on the fundamentals of item response 
theory (ERT). The IRT methods are theoretically preferred but are computationally intensive and require a 
minimum sample size of 1000 examinees and test length of 40 items (Shepard, Camilli & Averill 1981; 
Raju 1988, 1990). On the other hand, the M-H approach has also become popular because it is easy to 
implement and has an associated test of statistical significance (Prieto, Barbero, & San Luis, 1997). While 
Hambleton and Rogers, (1989), Swaminathan and Rogers (1990) and Narayanan and Swaminathan 
(1996) showed that the M-H procedure was not very effective in identifying nonuniform DIF items, 
Prieto, Barbero, and San Luis (1997) found that the M-H procedure may be effective for detecting a 
relatively high proportion of nonuniform DEF. In an empirical comparison study, Raju, Drasgow, and 
Slinde (1993) found that there was close agreement between ERT-based procedures and the M-H 
procedure in a female-male comparison while different items were identified for DIF by the two 
procedures in their black- white comparison. McLaughlin and Drasgow (1987) and Lim and Drasgow 
(1990) showed that when Lord's chi-square statistic is employed with joint maximum likelihood estimates 
of ability parameters, it leads to incorrect and misleading DEF results, but when used with marginal 
maximum likelihood or Bayes modal estimates of ability parameters, it yields more nearly accurate DIF 
results. The point being made is that the search for effective methods for determining DEF items is 
inconclusive. Rogers' (1989) proposal for further studies to be conducted with these methods when the 
focal and reference groups have unequal ability, when sample size is small or when test length is short, is 
still relevant. 

Objectives of the Study 

The objectives of this study, therefore, are; 

1 . to use both item response theory (ERT) and Mantel-Haenszel (MH) approaches to look for 
DEF items, 

2. to compare the numbers and patterns of items identified by the two approaches. 
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3. to examine the degree to which these patterns are related to the differences we expect to find 
given what we know about differences in the curricula, and 

4. to explore what may be at the root of DIF in the large-scale examination, and how significant 
results may be interpreted. 

Educational Importance of the Study 

Muthen, Kao, and Burstein (1991) have called attention to instructional sensitivity of test items 
due to opportunity to learn. Citing Linn and Hamish (1981), they warned against mistakenly attributing 
DIF due to instructional bias to other sources such as ethnicity. As well, an earlier study by Shen (2000) 
revealed the influence of curriculum type on differential item functioning. However, as observed by Sireci 
and Swaminathan (1996), there are two problems to contend with when conducting a DIF study with two 
different language groups of examinees on different language versions of test items. Here one must try to 
separate the effects of curriculum differences and item language differences. 

The imique contribution of the present study lies in the fact that it compares the IRT and MH 
approaches to helping imtangle DIF in tests that may have both translation and curriculum DIF. It further 
reinforces the idea that possible sources of DIF can be identified using a general IRT loglinear regression 
as well as the Mantel-Haenszel approaches and compares the results of the two approaches. 

Method 

Data 

This study uses data from the content subtest of the 2001 SAIP Mathematics Assessment. 

Students who wrote the content subtest received a booklet containing 27 background questions, 15 
multiple-choice placement items and 110 short answer and multiple-choice items grouped into five 
sections according to difficulty. The items are ordered by level of difficulty. Of the 125 items on the test, 
75 were multiple-choice items with four response options and 50 were short answer items. 

Students were first administered the placement items. An exam proctor immediately scored these 
items. Based on the score on the placement items, each student was told to continue the test from one of 
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three “starting points” - Item 16, Item 41, or Item 66 - and to work as far as possible within the test time 
(CMEC, 2001). 

The content subtest assesses student achievement in four areas: (1) Numbers and Operations, (2) 
Algebra and Functions, (3) Measurement and Geometry, and (4) Data Management and Statistics. The 
Measurement and Geometry (M&G) items are the focus of this study. The content subtest has a total of 
31 M&G items; however, four of these are in the placement test. This study examines the 27 M&G items 
that follow the placement test. 

Sample 

For the administration of the 2001 SAIP Mathematics Assessment, students were sampled from 
each participating province and, within some provinces, by language of instruction. Each student took 
either the content or the problem solving subtests, but not both. 

To facilitate the examination of DIF caused by curriculum and language differences, it was 
important to define two relatively homogeneous groups for comparison. The Ontario students in English- 
language and in French-language schools were selected. 

Of the Ontario students who took the test, those who omitted all 15 of the placement test items or 
did not provide their age were excluded. The resulting dataset consisted of 793 13-year-old and 677 16- 
year-old students fi*om English-language schools and 487 13-year-old and 546 16-year-old students from 
French-language schools. Because only about 5% of Ontario students are enrolled in French-language 
schools, in the 2001 SAIP data collection, the students from Ontario’s French-language schools were 
deliberately over-sampled to provide a number large enough for group analyses. Consequently, a student 
in a French- language school was almost 15 times as likely as a student in an English-language school to 
participate in the 2001 SAIP Mathematics Assessment. 

For the analyses, in addition to dividing students by age and language, it was necessary to divide 
the students by starting point in order to minimise the missing data in each analysis. The placement test 
items were not included in the analyses because performance on those items showed very little variability, 
particularly for students assigned to begin at Starting Points 2 (Item 41) and 3 (Item 66). For the students 
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beginning at each starting point, only the M&G items falling within the 60 items following that starting 
point were analysed, as few students were able to work farther than that within the time allowed. 

DIF Analyses 

Mantel-Haenszel (M-H) approach. An M-H chi-square (Holland & Thayer, 1988) was computed 
for each item, using the EZDIF program (Waller, 1998). The M-H computations involve testing whether 
the odds of answering an item correctly at a given score level is independent of group membership, when 
groups are matched on ability or achievement. This involves obtaining an odds ratio, a/,for each item in 
both the focal and the reference groups. 



„ Pn^f. 

Pfi(ln 



( 1 ) 



where 



Pri and qri represent the proportions in the focal group that responded correctly and incorrectly to 
the /th item, and 

Pfi and Qfi represent the proportions in the reference group that responded correctly and incorrectly 
to the /th item. 

A chi-square distribution with df=\ and a=.01 was used to identify items with significant DIF. 

Item Response Theory (IRT) approach. IRT provides a class of models that describe the 
relationship between latent ability and the probability of correctly answering an item. This probability is 
determined by factors referred to as item parameters, which can include all three or only one or two of 
item discrimination, item difficulty mAdi guessing factor. In the most general case where all three 
parameters apply, the model is the three parameter logistic (3PL) model, which is expressed as 



P,(0) = c,+ 






1 + 6 






( 2 ) 



where 

Cj is the probability of getting the item right just by guessing, 
bp the intercept of the trace line, is the difficulty parameter. 
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Qj or the slope of the trace line is the discrimination parameter, and 

6 is the latent ability or proficiency. 

When Cj is fixed to zero, the 3PL model simplifies to the 2PL model. 

A number of DIF detection procedures are classified as IRT methods. Among them are Lord’s 
chi-square statistic, the signed area measure and the unsigned area measure (Raju, Draslow, & Slinde, 
1993), as well as the model comparison approach described by Thissen, Steinberg and Wainer (1993). 
Marginal Maximum Likelihood (MML) item parameter estimates were computed using MULTILOG 
(Thissen, 1991). As described above, the content sub-test includes both multiple-choice and short answer 
items. The 3PL model was considered appropriate for the multiple-choice items; the 2PL model, for the 
short answer items. Omitted items within the 60 items following a student’s assigned starting point were 
counted as wrong. 

To examine items for DIF, the process described by Thissen, Steinberg and Wainer (1993) for 
comparison of two models was followed. An augmented model (A) includes all the parameters of a 
compact model (C) plus additional parameters. The analysis tests whether the additional parameters in the 
augmented model result in a significant improvement in model fit. The test of significance is of the form 

Likelihood[A\ 



G\dJ) = 2 log 



Likelihood[C] 



( 3 ) 



where Likelihood[.] represents the likelihood of the data given the maximum likelihood estimates of the 
model and df is the difference between the number of parameters in the augmented model and the number 
of parameters in the compact model. Under very general assumptions, the value of G^(dJ) is distributed as 
)^{dj) under the null hypothesis. Thus, if the value of G\dJ) is large, representing an unlikely value from a 
)^{dj) distribution, we reject the null hypothesis and the compact model (Thissen, Steinberg, & Wainer, 
1993). 

To equate the parameters of both the focal and reference groups in the same metric, the anchor 
test method (Camilli & Shepard, 1994), whereby item parameters for both groups are simultaneously 
estimated, was used. This approach requires constraining the parameters of the anchor items to be 
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identical for the two groups, while the parameters of one item, the studied item, are allowed to vary. In 
these analyses, the item parameter estimates for the English- and French-language versions of the items 
were constrained to be equal for all items except the particular item being studied. In addition, the item 
parameters were estimated with all items constrained to be equal across the French- and English-language 
versions. This was the compact model. The model in which one item’s parameters were allowed to vary 
was the augmented model. If the fit of the augmented model was significantly better than that of the 
compact model - that is, allowing separate parameters for the two versions of the item resulted in a 
significantly improved fit - the item was considered to exhibit DIF. 

Results and Discussion 

Items Exhibiting DIF 

Thirteen of the 27 Measurement and Geometry items were flagged as exhibiting DIF by the 
Mantel-Haenszel approach (see Table 1). Six of these 13 items were also flagged by the IRT approach. 

No additional items were flagged by the IRT approach. One of the items (Item 33) was flagged for both 
13- and 16-year-old students assigned to Starting Point 1. No other items were flagged for both ages or 
for multiple starting points. 

Translation 

Each item on the SAIP was developed in either English or French and was then translated into the 
other language. As the report of the 1997 mathematics assessment (CMEC, 1997) describes, 

A linguistic analysis of each question and problem was also conducted to make sure 
French and English items functioned in the same manner. For the marking sessions, 
francophone and anglophone coders were jointly trained and did the marking together in 
teams working in the same rooms, (p. 4) 

We would expect these efforts to minimise the possible sources of translation DIF. As other studies (e.g., 
Allalouf, Hambleton, & Sireci, 1999; Gierl & Khaliq, 2001) has shown, however, it is very difficult to 
achieve perfect agreement in the meaning and vocabulary difficulty of translated materials. 
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One challenge for students in French-language schools across Canada is differences in vocabulary 
use between provinces. Because the French-speaking communities outside of Quebec tend to be small 
and, often, isolated, the evolution of French words in these communities does not always parallel that in 
Quebec. The French-language versions of most SAIP mathematics items were created and approved by 
individuals speaking French as it is spoken in Quebec. It is possible that some of the vocabulary used was 
less familiar and therefore more difficult for Ontario students in French-language schools than for Quebec 
students. Furthermore, if the focus was implicitly on ensuring that items were of the same difficulty for 
students in English-language schools outside Quebec and students in French-language schools in Quebec, 
the students in French-language schools in Ontario may well be disadvantaged in comparison to students 
in English-language schools in Ontario. This would suggest, however, that the items exhibiting DIF 
should favour students in English-language schools. In fact, seven of the 13 items favoured the students in 
French-language schools. 

Beyond vocabulary difficulty, there may be more subtle translation differences. A review of the 
items suggests some possible differences. For example. Item 35 requires identifying “lines” or “lignes” in 
a drawing. Some of the lines are curved and it may be that students taking the French-language version 
were more likely to assume that only straight lines were to be counted as lines. Item 49 involves edges, 
faces, and vertices of a three-dimensional object. This item was significantly more difficult for the 16- 
year-old students taking the English-language version of the test. An examination of the responses 
indicates that 10% of the students taking the test in French confused faces and vertices in their answer, 
while more than 20% of those taking it in English made this mistake. It seems that the term “side” is often 
used instead of “face” in the materials for the English-language students, which may have contributed to 
the confusion. 

Curriculum 

In 1999, Ontario introduced a new high school curriculum. The curriculum was introduced for 
that year’s Grade 9 students, but did not apply to earlier cohorts of students - that is, students in Grade 10 
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or higher in 1999. According to Ontario's “context statement", at the time of the 2001 SAIP Mathematics 
Assessment 

. .most 13-year-old students were enrolled in either grade 8 or grade 9 mathematics both of 
which are mandatory core subjects in the new curriculum . . . [however] most of the 16-year-old 
students in the assessment would have been studying the old mathematics curriculum and taking a 
grade 1 1 course at one of the three possible levels of difficulty or would have taken no 
mathematics course since grade 10” (CMEC, 2001, p. 57). 

The new curriculum differs from the old both in its content and in the process used to develop it. 
Before 1997, the provincial mathematics curriculum in Ontario was developed in English, then translated 
into French (Ontario Ministry of Education, 1985a, 1985b). As a result, in both languages, the defined 
curriculum contained the same content, although differences existed in how content was presented in 
textbooks and other resource materials, and different resources were available in English and in French. 
The post- 1997 mathematics curriculum (Ontario Ministry of Education, 1997a, 1997b, 1999a, 1999b, 
2000a, 2000b), in contrast, was developed separately for French-language and English-language schools. 
The curriculum development teams worked in parallel and most of the expectations are for the same 
content. However, a few expectations differ. The example that follows illustrates the difference. The 
English-language curriculum explicitly states that both Grade 9 Academic and Applied courses will 
expect students to “substitute into and evaluate algebraic expressions involving exponents, to support 
other topics of the course (e.g., measurement, analytic geometry)”. But the French language curriculum 
does not have this expectation (although this content might be implied from some of the other 
expectations). Additionally, in French language Grade 9 Applied courses, students must “communiquer 
les Stapes de la resolution de problemes et les justifier” and “verifier la solution d'une equation,” while 
the same expectations are not explicit for French Academic or English Applied or Academic courses. As 
these examples illustrate, the differences are not large, but may well result in different content being 
taught and with different emphasis. It may, in fact, be that the differences in curricula reflect differences 
that already existed when the curricula were developed. 
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The materials to which classroom teachers have access also differ. Many more textbooks are 
available in English in Ontario. However, French-language school boards have often collaborated in 
developing curriculum-linked materials for classroom teachers. Because fewer textbooks and other 
resources are available to the teachers in French-language schools, the board-developed materials are very 
detailed. Because little else is available, the teachers tend to rely on these materials. As a result, teachers 
of mathematics in French-language schools are more consistent across the province in the content and 
style of the instruction they deliver than are teachers in English-language schools. 

Differences in curricula and practice between French- and English-language schools are very 
difficult to separate from language differences. To further complicate things, it is not clear to what extent 
teachers in either language are teaching the new curriculum. In fact, some of the S AIP items are not 
covered explicitly in the new curriculum (Ontario Ministry of Education, 2002) and so might favour 
students still being taught under the old curriculum. For example, Item 49, which involves edges, faces, 
and vertices of a three-dimensional object, presents a problem in that the term “edges” is not used in the 
new curriculum. Item 84 requires solving a problem relating the dimensions of a cylinder to its volume - 
this content is in the new Grade 9 curriculum, so that the 13 -year-old students taking the test may not 
have been familiar with it. Item 101 involves the lengths and angles between chords of a circle, content 
that was included in neither Ontario’s old nor new curriculum. Similar mismatches between item content 
and curriculum were identified by Schmidt, Wolfe, and Kifer (1992) on the Second International 
Mathematics Study (SIMS) and by Lawson, Bordignon, and Nagy (2002) on the Third International 
Mathematics and Science Study (TIMSS). 

Differences in item functioning may also have to do with what students know, but in areas not 
directly related to the curriculum. Item 35, for example, had as its graphic a map of the Quebec City area 
and was slightly easier for students taking the French-language version. 
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Test-Taking Approach 

This study set out to examine translation and curriculum as sources of DIF in the SAIP 
Mathematics Assessment. However, the analyses suggest an additional possible source of DIF for these 
items: students’ test-taking approaches. 

As Figure 1 shows, students beginning at Starting Point 2 and taking the English-language 
version of the test attempted more items than students taking the French-language version did. This was 
true for both 13- and 16-year-old students and for all three starting points. This makes interpretation of the 
results difficult. According to Table 1, the items later in the test tended to favour students taking the 
English-language version; however, more students taking that version attempted these items. In fact, the 
percentages of students answering each item correctly out of those who attempted it, are higher for 
students taking the French-language version, indicating that they were more likely, if they responded to 
an item, to respond correctly. Different explanations are possible. It may be that greater difficulty of the 
vocabulary for the Ontario students taking the French-language version resulted in slower responding. 
However, it is also possible that other factors, such as experience with similar tests or a lesser propensity 
to guess, contributed to a different test-taking approach. 

H. Jodouin, who is an on-site observer of the administration of the SAIP assessments in both 
English-language and French-language schools, notes that students in the English-language version tend 
to be encouraged to attempt as many items as possible and to give their best guess on the multiple-choice 
items. According to him, administrators of the French-language version tend to remind students that they 
should be able to justify their answers (H. Jodouin, personal communication, 13 January 2003). Other 
Ontario educators have suggested that students taking the French-language version may work more 
slowly on items because their teachers and textbooks emphasise careful and thorough work. 

Results from the Second International Mathematics Study (SIMS) revealed large differences in 
the number of items attempted across countries (Schmidt, Wolfe, & Kifer, 1992). For example, students 
in France showed the highest omit rates, with almost half of the students omitting some of the items; 
students in Thailand, in contrast, were very unlikely to omit, with most items having fewer than one 
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percent omits. As Schmidt, Wolfe, and Kifer (1992) note, such differences can have a significant impact 
on test results, particularly for multiple-choice items. They describe the possible effect of differences in 
test- taking approach as follows: 

Students in one system may, for a number of reasons, be willing to answer a multiple-choice 
question on the mathematics test even if they are not sure they really know the answer. Because 
of cultural differences, students in another system with the same level of knowledge and certainty 
might choose to not answer the question. In the first case, even if they do not know the answer, 
they have a non-zero probability of getting the item correct; such is the role of chance in multiple- 
choice testing. The students in the second culture have a zero probability of getting the item 
correct even if they know the answer. Hence, answering and knowing are not the same thing! (p. 

88 ) 

Similarities and Differences between IRT and MH Procedures 

Both IRT and MH procedures gave exactly the same result for only two items. Items 33 and 35 for Starting 
Point 1 for the 13-year-olds. Item 33 showed DIF at the .001 level of significance in both IRT and MH procedures 
in favour of students in English-Language schools. Item 35 showed DIF only at the .05 level of significance in both 
procedures in favour of students in French-Language schools. Item 33 is a short answer, conceptual question, while 
Item 35 is a short answer procedural question. Based on their b parameters, both are not very difficult items and 
have somewhat low levels of discrimination, ranging from 0.37 for French Item 35 to 0.76 for English Item 33. In 
terms of the magnitude of DIF, Item 33 is the one that really gives cause for concern, being flagged at the .001 level 
of significant by both procedures. There are no other items flagged at the .001 level of significance by both 
procedures. 

Four other items, 105 (for the 13-year-olds at Starting Point 3), 49 and 94 (for the 16-year-olds at Start 
Point 2), and 101 (for the 16-year-olds at Start Point 3) were flagged at the .01 level of significance by the MH 
procedure. Incidentally, these same items were flagged at the .05 level of significance by the IRT procedure, and all 
except Item 49 favoured students in English-Language schools. In siunmary, the MH approach identified more 
items (5 items) with DIF at the .01 level or less, while the IRT procedure identified only one at that level. Of the 
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five items, two are short answer and three are multiple choice. All but one measures conceptual ability; the 
exception measures procedural ability. 

It is important to note that those four items flagged by the MH procedure at the .01 level of significance 
were also flagged by the ERT procedure at the .05 level of significance. This suggests that we have a higher chance 
of not rejecting a false null hypothesis, thereby committing a Type II error, by using the ERT approach than by using 
the MH procedure. In a previous study Raju, Drasgow and Slinde (1993) compared MH and three IRT approaches 
in detecting race-based (Black- White) and gender-based (Female-Male) DEF. In the Black- White comparison, they 
foimd that the MH approach identified two items at the .001 level of significance while the three IRT approaches 
identified only one item at the .001 level of significance. There was no overlap of biased items between any of the 
ERT approaches and the MH approach, while there was 100% overlap between all three IRT approaches. In the 
Female-Male comparison, whereas the MH approach also identified one more DEF item than the IRT approaches, 
there was a 100% overlap of biased items between the IRT approaches and the MH approach. 

There is no clear pattern observed in the identification of DEF items by IRT and MH procedures in relation 
to their relative chi-square values. For instance, one of the five items identified by MH at the .01 level of 
significance and by ERT at the .05 level of significance has a smaller MH chi-square value (x^m-h) than ERT chi- 
square value (x^diff)- Also, of the 8 items flagged by MH at the .05 level of significance but not flagged by IRT, 3 
have higher x^m-h values than x^diff values, while 5 have higher x^diff values than x^m-h values. 

Hills (1989), Raju (1990) and Shepard, Camilli and Averill (1981) have suggested that the IRT 
approaches are theoretically preferred because of the invariance property of ERT item parameters. 

However, Hills (1989) and Prieto, Barbaro and San Luis (1997) are of the view that the MH techniques 
are more practical because they are easy to implement, have an associated test of significance and are 
relatively stable with small sample sizes and short test lengths. The Prieto, Barbaro and San Luis (1997) 
study also foimd, in contrast to Swaminathan and Rogers (1990), that the MH procedures can be effective 
in detecting a relatively high proportion of nonuniform DIF. 
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Limitations 

This study has several limitations. First, in order to obtain sufficiently complete data matrices, the 
analyses were performed by age and by starting point. This meant, however, that the numbers of students 
in some groups were quite small, likely impacting the stability of the DIF analysis results. Second, the 
decision to treat omitted items as wrong made certain assumptions about the test-taking approaches of the 
students taking the test. The dramatically different numbers of students attempting the later items on the 
French-language and the English-language versions of the test suggest that these two groups may have 
been using different test- taking strategies. This possibility merits further investigation. Finally, the 
correspondence between the prescribed curriculum and what is actually taught in the classroom is rarely 
perfect. Although few differences were found between mathematics curricula for the two languages of 
instruction in Ontario, it is not easy to dismiss curriculum as a possible source of DIF without analysing 
actual classroom experiences of the two groups. 

Conclusion 

The purpose of this study was to investigate the possible impacts not only of language, but also of 
curriculum differences, on the performance of test items for subpopulations of students. By focusing on 
Measxnement and Geometry items and students in French- and English-language schools in Ontario, we 
were able to explore such differences more closely than has been possible for studies using more 
heterogeneous sets of items and samples of students. The results demonstrate the complexity of the 
factors that contribute to how items are understood and approached by different groups of students. 

The results also suggest areas for further exploration. First, curricular differences might be better 
understood in combination with information about teachers’ classroom practices. Teachers’ academic 
training, experience, and the materials available to them might well influence practices. Such contextual 
information would help us understand how the curriculum is being understood and presented. In addition, 
an examination of the patterns of items attempted by students taking the English- and French-language 
versions suggests a difference in test-taking approaches. Further research on the test- taking approaches of 
these two groups might well explain some of the differences in test results. 
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Table 1 

The Measurement and Geometry Assessment Domain of the 2001 SAIP Mathematics Assessment 



Criteria 



Level 1 

The student... 

1. estimates and measures lengths and areas in terms of non-conventional units. 

2. estimates and measures lengths in metres, decimetres or centimetres. 

3. demonstrates an understanding of the concept of magnitude and the concept of area. 

4. demonstrates an understanding of the concepts of interior, exterior and boundary. 

5. with the aid of concrete materials, derives certain characteristics such as colour, form and 
size for solid forms and for two-dimensional figures. 

6. demonstrates an understanding of the concept of symmetry in simple activities such as 
folding and the completion of drawings. 

Level 2 

The student... 

1 . estimates and measures dimensions of familiar objects, using SI units. 

2. establishes relationships between SI length units, 

3. demonstrates an imderstanding of the notion of volume. 

4. estimates and calculates areas in both conventional and non-conventional units. 

5. uses SI symbols, including the prefixes milli, centi, deci and kilo, in the context of length and 
area measurements. 

6. solves simple real-life problems using SI units of length, area and time. 

7. demonstrates an understanding of the concepts of angle and measures of an angle. 

8. develops and applies problem-solving strategies related to spatial relations. 

9. knows and uses characteristics of various solid forms for the purpose of classifying the solid 
forms. 

10. determines, starting from concrete materials, certain characteristics of solid forms and the 
two-dimensional figures that form their boundaries. 

1 1 . constructs solid forms from simpler forms. 

12. describes, draws, and classifies polygons and polyhedra according to certain of their 
properties. 

13. describes and performs single geometric transformations of translation, rotation or reflection. 
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Criteria 



Level 3 

The student. . . 

1. using concrete examples, demonstrates and understanding of the concept of the measure of 
an anle. 

2. solves real-life problems relating to angle measure, segment length, area, volume, mass and 
time. 

3. reproduces plane figures by using repetition of one of the plane geometric transformations of 
translation, rotation, reflection or similarity. 

4. constructs various plane geometric figures. 

5. identifies and uses various characteristics relating to triangles, quadrilaterals and circles. 

6. calculates area, circumference and diameter of a circle. 

7. establishes relationships between SI units (including squared and cubic units). 

Level 4 

The student... 

1. solves problems derived from real-life situations that use the characteristics of solid forms. 

2. finds the unknown elements in right-angled triangles and isosceles triangles using scale 
drawing or calculation methods, provided that the student is given the option of which 
method he/she can use. 

3. solves real-life problems using the notion and units of capacity. 

4. solves problems derived from real-life situations involving area, circumference and diameter 
of a circle. 

5. reproduces plane figures using ordered sequences of the following plane transformations: 
translation, reflection, rotation and similarity 

6. demonstrates an understanding of the concepts of isometry (congruence) and similitude 
(similarity). 

7. solves real-life problems using the concepts of isometry and similitude in triangles and 
polygons. 

Level 5 

The student. . . 

1. calculates arc and chord lengths, sector and segment areas, associated with a circle. 

2. solves real-life problems involving length and angle relationships within the circle or the 
right-angled triangle. 

3. finds unknown elements in general triangles using scale drawing or calculation methods, 
provided that the student is given the option of the method that he/she can use. 

4. constructs a rigorous proof (not necessarily in two-column format) by stating properties, 
theorems, or corollaries involved in the solution. 
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Note: From Council of Ministers of Education, Canada. (2001). School Achievement Indicators Program 
Mathematics Assessment: Criteria and framework . Toronto, ON: Author. 
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Table 2 



Measurement and Geometry Items on the Content Subtest of the 2001 SAIP Mathematics Assessment 



Item Order 


Item Type 


Achievement Level 


Target Ability 


023 


Multiple Choice 


1 


Conceptual 


024 


Multiple Choice 


1 


Conceptual 


026 


Multiple Choice 


1 


Conceptual 


033 


Short Answer 


1 


Conceptual 


035 


Short Answer 


1 


Procedural 


036 


Short Answer 


1 


Procedural 


042 


Multiple Choice 


2 


Conceptual 


047 


Multiple Choice 


2 


Problem Solving 


049 


Multiple Choice 


2 


Conceptual 


053 


Multiple Choice 


2 


Problem Solving 


064 


Short Answer 


2 


Procedural 


065 


Short Answer 


2 


Problem Solving 


069 


Short Answer 


3 


Problem Solving 


074 


Short Answer 


3 


Conceptual 


083 


Multiple Choice 


4 


Conceptual 


084 


Multiple Choice 


4 


Problem Solving 


086 


Multiple Choice 


4 


Conceptual 


088 


Multiple Choice 


4 


Conceptual 


094 


Short Answer 


4 


Procedural 


096 


Short Answer 


4 


Procedural 


100 


Short Answer 


4 


Problem Solving 


101 


Multiple Choice 


5 


Conceptual 


105 


Multiple Choice 


5 


Procedural 
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Item Order 


Item Type 


Achievement Level 


Target Ability 


108 


Multiple Choice 


5 


Conceptual 


109 


Multiple Choice 


5 


Problem Solving 


110 


Multiple Choice 


5 


Problem Solving 


125 


Short Answer 


5 


Problem Solving 



Note, Item classifications were provided by the Council of Ministers of Education, Canada. 
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Table 3 



Ontario Students Responding to the Content Subtest of the 2001 SAIP Mathematics Assessment 



Age 




English-Language Test 


French-Language Test 


Starting Point 


Number 


Percent 


Number 


Percent 


13 


1 


516 


65.1 


330 


67.8 




2 


214 


27.0 


114 


23.4 




3 


63 


7.9 


43 


8.8 


16 


1 


249 


36.8 


254 


46.5 




2 


234 


34.6 


190 


34.8 




3 


194 


28.7 


102 


18.7 




27 



Items Exhibiting DIF According to IRT or ManteUHaenszel Approaches 



fS 



o 



fl 

o 



a 

CQ 

U 

H 



a 

CQ 



s 

3 

3 

o 



3 

u 













(N 




VO 








(N 


m 


to 






ON 














o 


<N 








O 


rt 


fN 
















P 


p 


p 








P 


p 


P 






p 


3 


































c/3 






































K 

1 








* 

* 










* 




















* 


* 


* 








* 


* 


* 






* 


5 


r4S 








<N 


p 








p 




p 






p 










o 










ON 




to 






to 












<N 












VO 










VO 






cx 






<N 


o 


<N 










Q 


Q 






OO 


c/3 

3 

c/3 










p 


p 








p 










p 






^4-H 






m 


<N 


<N 








m 


m 


m 






fN 


0^ 
































H 












* 


































* 

* 


* 






















(N'-S 






p 


m 


to 








* 


p 


to 






p 












to 


ON 










oo 


fN 


o 
































1 


1 






























<N 


m 


fN 












o 






fN 


































O 












o 


o 


O 


















































o 






















N- 












m 


CN 


CN 








m 


oo 




to 




to 








<5 




m 


P 


p 


p 






'Tt 


p 


p 


p 




fN 


fN 










II 


cn 






11 




II 










II 














1 


1 


1 


















1 










JO 








J3 




J3 




On 






J5 












CJ 


CN 






CJ 




CJ 




On 




CJ 


CO 






C3 




d 


p 




m 


d 




d 


CN 




O 




d 


oo 












O 


o 


o 


a 




a 


O 








3 


o 


O 








IX| 








d 




Ph 










Ph 




3 

Q> 






c/5 


vo" 












rn 








c/5 


oT 




Ui 

IX| 






Q 














VO 








Q 


N- 




d 


O 


hJ 


to 




s? 




CN 




II 

JS 








hJ 


fN 






Q> 

CJ 


S 

o 

U 


O 


II 


VO 


oo 


m 


II 




5^ 


m 


OO 


O 


II 


to 




d 






cn 

VO 


to 


fO 

fO 




§ 


p 

Tj- 


p 


On 

vd 






m 

oo 




< 


JO 


oo 






J3 




CO 








< 


J3 










W 


c/3 








CO 


1 


3b 

d 








w 


CO 












3b 








3b 










3) 








•o 




d 








d 




S 








VO 


d 












PL] 


N® 


N® 


N® 


W 






N® 


N® 


N® 




W 


N® 




C 

Q> 

O 


D. 

o 






oo 




to 


CN 




(O 




m 


o 






N- 




Q> 

ft < 


d 

OJ 




.s 


oo 


(N 


to 


s 




.3 

Q 


P 


On 

O 


p 

od 




.3 


m 




Mh 


< 




o 


ON 


On 


On 


o 




CLh 


CN 


fN 






o 


On 










Oh 








Oh 




OD 










Oh 












OD 








OD 




.s 










00 












.3 


VO 






.3 




t: 




N- 


VO 




^d 








o 




3 


CN 






t: 




3 










3 














o 






3 




c/5 


o 


O 


o 
















c/5 




OO 


O 


c/5 














c/5 


to 












ro 


p 


p 








fN 


fN 


p 






00 












1 


1 


O 

1 


















1 












s 


VO 


VO 








ON 


VO 


1 






oo 






Q 








VO 








fN 


fN 


oo 






VO 












o 


o 


O 












o 






o 




































c/3 




































































3b 

d 


d 


o 






























W 


<u 


p 








to 










O 


fN 


fO 






Tf 




a 


fc: 






p 












rr 


fN 


VO 










(D 


o 






fO 


ON 










to 


fN 


O 






to 




Oh 


U 






oo 


OO 










fN 


fN 


fN 






oo 






•o 


































CJ 






•«wO 


•«wO 










•vO 


•vO 


•vO 






••wO 




d 


































o> 


di 






CN 


ON 












N- 


OO 






00 




CJ 

Uh 


B 

CJ 

ti 

< 






rr 


p 










P 




rn 










CJ 






On 




vd 








to 


r-‘ 


fN 






to 




Oh 






On 


On 


On 








to 


to 


to 






ON 






Uh 


































OJ 
































B -S 






m 


m 


to 








to 


oo 


o 






CO 












CN 


m 


m 








o 


o 








CO 






:z; 
































28 



00 

n 



s 

a 

o 



a 

cd 

u 

H 

a 

S 

*3 



s 

U 









(N 


m 


(N 


rn 




VO 


rn 


rn 










o 


Tt 


o 




(N 




o 


Cx, 




O 


p 


p 


p 




P 


P 


p 


3 






















C/3 






















0) 






















0^ 


K 






« 




« 








« 






« 


« 




« 




« 


« 


« 


s 


cnS 




»-H 






p 




p 


CN 


p 








00 


Tt- 


00 






vd 


ON 








(N 


00 








<N 


<N 


in 




CX, 




(N 


m 


<N 






r- 


00 










p 


p 


p 




p 


p 


p 
























3 






















<L> 






m 


m 


m 


<N 




m 


m 


m 
























H 






























« 




« 








« 




tN *5 




00 




Tt; 


m 






r- 


in 








»rj 


00 




00 




in 


d 


d 
































VO 








r- 


r- 










CN 


(N 


(N 






CN 


CN 


fN 








O 


o 


o 






d 


d 


d 




























o 










(N 












o^ 


Tf 


5 


r- 


as 


O 




as 












p 


p 




00 


m 


ON 






II 








d 


II 


d 


d 


d 








1 


1 


1 






1 


1 








JP 

o 


m 


00 




(N 


*d 

<j 


VO 




VO 






d 


r- 


VO 




r- 


d 


p 


p 


00 


j3 




d> 


o 


o 


o 


d 


a 


d 


d 


d 


o 




Ph 










Ph 








c 


























Tf 


















tlH 


C o 


m 

(N 










OV 










o> o> 

r i d 


II 


(N 




m 


00 


II 




ON 






^ fa 




Tt; 


CN 


p 


p 




p 


p 


p 




o> o 

Plh CJ 


00 




o 




Tt* 


in 


00 




;S 


00 


oo 


oo 




JS 

CO 


r- 


NO 


m 








’3) 










*3) 










*o 


d 










d 










C ^ 

s & 


W 

?N 




in 


r- 


as 


W 

m 


O 


00 


r- 




y c 


a\ 


as 


Tf 


p 




p 


o 


NO 




^ s 
< 


d 


00 


00 


as 


in 


d 


d 


vd 


Nd 




• 9 

o 

Olh 


o^ 


as 


as 


Tf 


*o 

Plh 


o 


ON 


VO 






00 










00 












a 


VO 


in 






^d 


VO 












?N 


CN 


p 






p 


p 


p 






-S 


O 


o 


d 




-d 


d 


d 


d 






C/3 


00 


Tf 


as 


fN 


xn 


00 


in 










VO 


00 


in 


p 








p 








1 


o 

1 


d 

1 


d 




d 

1 


d 


d 

1 








5 


m 


Tf 


r- 






ON 










in 


00 


00 




r- 


ON 


r- 








o 


o 


d 


d 




d 


d 


d 


J2 






















c/3 












































00 

c 


3 o 




























VO 


O 


m 




m 


VO 


VO 




a 




p 


p 


p 


p 




in 


r- 


p 




o> o 




ri 


Tt* 


d 


00 




d 


00 






Pui CJ 




00 


r- 


r- 






VO 


in 


VO 




-o 






















^ o> 




vO 


vO 


vO 


vO 




NP 


Np 


Np 




i & 




?N 


?N 


as 


<N 




Tf 


m 


r- 










P 


p 


OO 




Os 


p 


p 








00 


00 


00 


d 




d 


Tf 


in 




Q. ^ 
< 




as 


as 


as 


VO 




Ov 


ON 


00 




U, 






















— ^ 






















S -i 




r- 


as 


m 


Tf 




m 


Tf 






W C 




Tf 


Tf 


in 


Ov 




00 


00 


o 




MH P 






















2 





















cd 

0 

Vx 

a. 

Cd 

o> 
■*— » 

1 

o 

CJ 

o 

3 

*c2 

W3 

0 

1 



0> 

*•5 

cd 

00 

0> 

c 



(N *5 

cd 

-C? 



cd 

o 



o 

V 

Ci, 

« 

o 

V 

Ci, 

« 

« 

o 

o 

V 

Ci, 

« 

« 

« 






o 

ERIC 



29 



Comparison of IRT and M-H results. 



On 






a 

o 

a 



C/5 

a 

a 



H 



*0 

a 

a 



k« 




X 

2 

rc 



o 

•a 



0> 

Q 

& 

Q 



-a 

o 

0 
u 
Pln 

SB 

1 

s 

-a 



diD 

a 

r 

a 

Ck 

S 

o 

U 



H 



o 

o 



»r> 

O 



o 

o 



w — 

bo'a 



S ^ 

1 U. 



Xi 

< 



S (D 

Ck 

« >> 
H 



(D 

^ i 

2 



X 



X 



O 



X 



vq 

o\ 



X 



Os 

Tl* 



XXX 



X 2 



UJ 



u u 



u 



tN 



c 

*o 

(X 

c 

*€ 

iS 

00 



< 

00 






X 



X 



UJ 



p 

od 



UJ UJ 



U 



CO 

CL, 



< 

00 



u 



U U 



00 o 

o ^ 



c 

*o 

Cl 

00 

c 

*€ 

B 

00 



X 



od 



X 



X 



X 



Tf 

od 



UJ 



U 



< 

00 






c 

*o 

Cl 

00 

c 

*€ 

iS 

00 



CO 

CL 



u 



00 

Cl 



U U 



r- 



tN 

(£ 

00 

c 

*€ 

B 

00 



On 



»r> 



30 



MC 



Curriculum and Translation DIF 30 



(D 

Id 

> 

B 



X 



00 

00 



X 



too 

.a 

'o 



X X 



o 



on 

eu 



T3 

(D 

U 

o 



X 






00 



X 






4= 

u 

c 



too 

c 

w 



D. 

0> 

u 

C 

o 

u 



U 



< 

(/) 



On 



U 



U 



t/5 

Ou 



u 



U U 

S S 



00 o 



(D 

CO 

c 

cd 

t: 

O 

43 



< 

on 

hj 

o 

‘o 

43 

O 

*0- 



c 

'o 

Oh 

too 

c 



(/) 



o 

% 

a 

o 

H 



3J 

a 

II 

CJ 






er|c 



31 



Curriculum and Translation DIF 31 



Figure Caption 

Figure 1. SAIP administration with placement test and starting points. 

Figure 2. English and French versions of Item 53 (released in Report on Mathematics Assessment III, 
CMEC, 2001). 

Figure 3. English and French versions of Item 108 (released in Report on Mathematics Assessment III, 
CMEC, 2001). 

Figure 4. English and French versions of Item 1 10 (released in Report on Mathematics Assessment III, 
CMEC, 2001). 

Figure 5. Numbers of items attempted, 13- and 16-year-old students taking the English- and French- 
language versions and beginning at Starting Point 1. 

Figure 6. Numbers of items attempted, 13- and 16-year-old students taking the English- and French- 
language versions and beginning at Starting Point 2. 

Figure 7. Numbers of items attempted, 13- and 16-year-old students taking the English- and French- 
language versions and beginning at Starting Point 3. 
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Figure 1. SAIP administration with placement test and starting points. 
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James wants to run the perimeter of a playing 
field, the dimensions of which are marked in 
centimetres and metres. 



50 m 



9 000 cm 

What distance will James cover by running 
once around this Held? 

A) 280 cm 

B) 280 m 

C) 18 100 cm 

D) 18 100 m 



Jacques court en suivant la ligne delimitant le 
contour d’xm terrain de jeu dont les dimensions 
sont indiquees en metres et en centimetres. 



50 m 



9 000 cm 

Quelle distance Jacques devra-t-il 
parcourir pour faire un tour complet du 
terrain de jeu? 

A) 280 cm 

B) 280 m 

C) 18 100 cm 

D) 18 100 m 



Figure 2. English and French versions of Item 53 (released in Report on Mathematics Assessment III, 
CMEC, 2001). 
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The following diagram is a house plan. All 
comers are square. 



Le dessin ci-dessous represente le plan d^une 
maison. Tous les coins sont a angle droit. 




What is the length of the diagonal d in 
terms of the variables given in the diagram? 

A) d = 2 +j; 2 

B) d = ^w^ +z^ 

C) d = 2+ w 2 

D) d = ^]y^+z^ 



Laquelle des expressions suivantes permet 
de calculer la longueur de la diagonale d en 
function des variables indiquees sur le 
dessin? 

A) d = 'n/jc 2+^2 

B) d = ^w 2 + 2 2 

C) d = ^Ix 2+ w 2 

D) d = "v/y 2+^2 



Figure 3. English and French versions of Item 108 (released in Report on Mathematics Assessment III, 
CMEC, 2001). 
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A drafting student must construct a symbol. 
The symbol consists of a circle of radius 30 
cm and an inscribed equilateral triangle. A 
metallic wire is used to outline the perimeter 
of the triangle. 

To the nearest centimetre, what is the 
length of metallic wire needed? 

A) 90 cm 

B) 156 cm 

C) 180 cm 

D) 188 cm 



Un ^tudiant en graphisme doit reproduire un 
symbole. Ce symbole est form^ d*xm cercle de 
30 cm de rayon et d’un triangle equilateral 
inscrit. Un fil metallique d^limitera le contour 
de ce triangle. 

Quelle est, au centimetre pr^s, la longueur 
du fil metallique? 

A) 90 cm 

B) 156 cm 

C) 180 cm 

D) 188 cm 



Figure 4. English and French versions of Item 1 10 (released in Report on Mathematics Assessment III, 
CMEC, 2001). 
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— ■ — 13YO English-Language 

- - » - • 13YO French-Language 
— A — 16YO English-Language 

- • A- • • 16YO French-Language 



Figure 5. Numbers of items attempted, 13- and 16-year-old students taking the English- and French- 
language versions and beginning at Starting Point 1. 
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— ■ — 13YO English-Language 

- - * - • 13YO French-Language 
— * — 16YO English-Language 

- - ^ • 1 6YO French-Language 



Figure 6. Numbers of items attempted, 13- and 16-year-old students taking the English- and French- 
language versions and beginning at Starting Point 2. 
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Number of Items Attempted (Cumulative) 



— ■ — 13YO English-Language 
■ ■ * ■ • 1 3YO French-Language 
— * — 16YO English-Language 
' • ^ * • 16YO French-Language 



Figure 7. Numbers of items attempted, 13- and 16-year-old students taking the English- and French- 



language versions and beginning at Starting Point 3. 
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